-
-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First steps on migrating to a new version of mongo. #3651
base: master
Are you sure you want to change the base?
Conversation
We're going to write a helper binary for this in Go, because: - The currently maintained driver package supports mongo versions back to 2.6 (what we're using), which is not true for most other languages. - We already have go in our toolchain for boringssl's test suite. - The build is unlikely to break due to bitrot re: Go's toolchain. - The generated binary is static, so all else fails we can just bundle the executable, though I don't anticipate that. - I will be much more productive than in something else.
I'm getting a permissions error trying to just run this from my local dev directory; need to figure out what's going on.
We're successfully listing the collections.
Could be related, listing collections of MongoDB, exporting to JSON. |
@xet7 I am hoping this particular project may also yield a way to upgrade meteor-spk-built applications to the newest Mongo version as well. |
I did also think if it would make any sense to convert raw MongoDB database files directly to other format, without starting any MongoDB server. And would there be similar ways for mongodump file format. But I presume it may not be so useful. https://www.percona.com/blog/2021/05/18/wiredtiger-file-forensics-part-1-building-wt/ It would be nice if all converting of Sandstorm MongoDB databases would be scheduled to happen at night, so that it would not disturb daytime using of apps. These conversions needs also checks, that is there enough free disk space to convert. |
mongodump (or mongorestore) of 400 GB MongoDB database takes about 4 hours. Some Snap and Docker WeKan users have that size databases. |
FWIW, the dumping half of this is already written. I'd be wary of dumping it to json though, since mongo's native format is bson, which supports a couple extra data types, like timestamps and binary blobs -- so dumping to json loses information. I suspect for sandstorm itself it won't be too slow; the database isn't that huge, since most data is stored in grains' storage. Though I'd be curious to know how big the database on alpha is (@kentonv?) Note on my local dev instance the on-disk use of /opt/sandstorm/var/mongo was around ~128MiB, and the exported data was less than 512KiB, so I assume it's doing some pre-allocation or something. |
Really? Binary blobs are exported in base64 encoded format, like GridFS attachments. Is somewhere more info about this? |
Quoting Lauri Ojansivu (2022-08-02 13:26:21)
***@***.***
I'd be wary of dumping it to json though, since mongo's native
format is bson, which supports a couple extra data types, like
timestamps and binary blobs -- so dumping to json loses information.
Really? Binary blobs are exported in base64 encoded format, like GridFS
attachments. Is somewhere more info about this?
When you read it back in, how do you tell if it's supposed to be a
string or a base64-encoded binary?
|
That bash script exports each collection/table in separate json files. By opening each file in text editor, I see json structure, is it nested, etc. At attachments, each part of attachments has ID and base64 string. If some file is divided to many parts, other info in json shows what is filename, size, md5, part IDs, etc. Sure more useful way would be to save attachments to binary files, and use those unique file IDs as filenames.It's not so useful to use real filenames, because there are many attachments with same names, special characters urlencoded, etc. Another way would be to name attachments by their sha256 or other hash, and that way do deduplication and save disk space. Some thinking is also about encrypting files etc data, but I have not coded it yet. I also have not coded yet scripts to convert JSON etc to other formats. |
You can certainly come up with ways of doing this that are safe for a
particular database. But the way this PR does it, we just export as bson
instead of json; then we don't have to worry about what the contents of
the database actually are.
Quoting Lauri Ojansivu (2022-08-02 14:22:45)
… ***@***.***
That bash script exports each collection/table in separate json files.
By opening each file in text editor, I see json structure, is it
nested, etc. At attachments, each part of attachments has ID and base64
string. If some file is divided to many parts, other info in json shows
what is filename, size, md5, part IDs, etc.
Sure more useful way would be to save attachments to binary files, and
use those unique file IDs as filenames.It's not so useful to use real
filenames, because there are many attachments with same names, special
characters urlencoded, etc.
Another way would be to name attachments by their sha256 or other has,
and that way do deduplication and save disk space.
Some thinking is also about encrypting files etc data, but I have not
coded it yet.
I also have not coded yet scripts to convert JSON etc to other formats.
--
Reply to this email directly, [2]view it on GitHub, or [3]unsubscribe.
You are receiving this because you were mentioned. Message ID:
***@***.***>
Verweise
1. https://github.com/zenhack
2. #3651 (comment)
3. https://github.com/notifications/unsubscribe-auth/AAGXYPQMPBUGQXRXED3Y3O3VXFRPLANCNFSM55EKIPLQ
|
Also make the flag an integer instead of a string.
Per discussion on IRC, we're going to write a helper binary for this in Go, because:
to 2.6 (what we're using), which is not true for most other languages.
the executable, though I don't anticipate that.
Right now all this does is bundle up a hello-world go binary with Sandstorm's build system. Marking as a draft.